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[57] ABSTRACT 

Computer-stored text, such as numerical information, is 
processed by a word list generator to develop a word list 
corresponding to those words that are to be spoken by the 
system. The word list generator assigns a prosodic environ- 
ment state or token to each entry in the list. The prosodic 
environment identifies how the word functions in its current 
prosodic context. Different intonations are applied based on 
the prosodic environment. Next, the preceding and adjacent 
words are examined to determine how each word may need 
to be pronounced differently, based on the ending phoneme 
of the preceding word and the beginning phoneme of the 
following word. Using this phonological information along 
with the prosodic infonnation, a sample hst is generated by 
accessing a dictionary of stored samples. The sample list is 
then serially played through suitable digital-to-analog con- 
version circuitry to generate the text-to-speech output. The 
result is a natural, human- like reading, complete with appro- 
priate intonation changes suitable to the context of the text 
material. 

12 Claims, 4 Drawing Sheets 
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HIGH QUALITY CONCATENATIVE 
READING SYSTEM 

BACKGROUND AND SUMMARY OF THE 
INVENTION 

The present invention relates generally to text-to-speech 
(TTS) reading systems. More particularly, the invention 
relates to a concate native reading system that produces high 
quality, naturally articulated speech by taking into account 
the prosodic environment of the words to be concatenated 
and also the phonological features of adjacent words to 
provide natural-sounding intonation. The system is particu- 
larly useful in reading numbers in tables, spreadsheets and 
the like. 

In the process of data entry into a computer from written 
records, proofreading is a tiring and time-consuming task. 
The data entry operator must constantly shift the eyes 
between the computer screen and the paper originals. 
Sometimes, if two people are available, they can share the 
proofreading task: one person reading the data out loud from 
the paper originals and the other checking the entry on the 
computer screen. 

This process of proofreading data entry can be facilitated 
through use of a speech synthesis system. Such a system 
allows the operator to keep the eyes on the paper originals 
while hstening to what has been entered. The operator does 
not need another person to read the data from the paper 
originals because the speech synthesis system handles this 
aspect, llius the operator can work alone. However, current 
speech synthesis systems are fatiguing to use, because 
speech quality is poor, lacking natural-sounding phrasing 
and intonation. User fatigue leads to errors. Hence current 
speech synthesis systems have proven deficient for critical 
proofreading applications. User fatigue is particularly preva- 
lent in number reading systems, where a monotonous tone 
and poor phrasing leads to many errors. 

The present invention provides a reading system that has 
a very natural voice with which the data entry operator can 
work without fatigue. The reading system employs a con- 
cate native technique whereby digitally recorded speech 
samples are concatenated or joined together to produce the 
speech output. The invention achieves a more life-like 
output by incorporating two variables of natural speech: (1) 
prosodic or intonational variation and (2) variation due to 
coarticulation of each word's initial and final phonemes with 
the final and initial phonemes of adjacent words. For each 
use of a word, a set of prosodic and segmental environment 
rules are applied to select a contextually appropriate digital 
sample. The result is a much more natural sounding syn- 
thesized speech that does not induce fatigue. Operators 
using the system thus enjoy a much lower error rate. 

The system of the invention captures what a human 
speaker does while proofreading. It reads numbers in a 
column or row, using a nonfinal intonation for all but the last 
entry. This intonation gives the listener a cue that the current 
number is not the final one in the column or row. This 
contextual cue is extremely helpful in proofreading, as the 
user is cued when the final number in the column or row is 
reached. This information is very valuable in detecting 
insertion and deletion errors, where the text on the computer 
screen and the text on the paper originals do not have the 
same number of entries due to data entry error. 

The invention comprises a high-quality concatenative 
reading system for converting an input string into a sequence 
for subsequent audible synthesis. The invention includes a 
dictionary of words stored in a computer-readable storage 
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medium and a word list generator coupled to the dictionary. 
The word list generator is receptive of the input string for 
building and storing entries in a word list within the com- 
puter's memory. The word list generator builds the word list 

5 from words stored in the dictionary to correspond to the 
input string. The generator has a set of stored rules for 
adding numeric placeholder words that correspond to inte- 
gers in the input string. Thus the word list generator will 
insert the appropriate numeric placeholders so that the 

30 integer number "1^43" will be pronounced "one thousand, 
two hundred forty-three." 

The word list generator further includes a list of prosodic 
environment tokens that represent a plurality of intonation 
types. The word list generator assigns at least one of the 

15 prosodic environment tokens to at least some of the word list 
entries. The preferred embodiment assigns a prosodic envi- 
ronment token to each of the words in the word list. 

The reading system also includes a database of speech 
samples stored in computer-readable memory. A phonologi- 
cal feature analyzer analyzes the word entries in the word list 
to determine the prosodic environment of those words. 
Specifically, the preferred embodiment consults a phono- 
logical feature table to determine what each word begins 
with and ends with. These features are compared with 
adjacent words to determine the phonological environment 
of each word. In natural speech, phonemes are pronounced 
differently in different phonological contexts. The adjacent 
phonemes affect how a phoneme will sound when spoken. In 
this case, the invention concentrates on the beginning and 

■"^ ending phonemes, ahering the pronunciation based on the 
words that precede and follow each word entry. 

Using the word list constructed by the word list generator, 
together with the prosodic environment information and 

2j phonological feature information, the reading system con- 
structs a sample list from the database of speech samples. 
The sample list represents the actual sampled data that are 
concatenated to supply the sequence for audible synthesis. 
Ilie sample list may be output through a digital-to-analog 

^ converter to produce an audible signal that may be amplified 
and played through a suitable speaker system. 

For a more complete understanding of the invention, its 
objects and advantages, reference may be had to the follow- 
ing specification and to the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating the presently pre- 
ferred architecture of the computer-implemented number 
reading system; 

^° FIG, 2 (FIGS, 2a and 2b, collectively) is a flowchart 
diagram showing the computer- implemented process per- 
formed by the number reading system of the preferred 
embodiment; 

55 FIG. 3 is a data structure diagram illustrating the presently 
preferred data structures generated by and manipulated by 
the number reading system of the invention to produce 
high-quality concatenated synthesized speech. 

DESCRIFHON OF TOE PREFERRED 
EMBODIMENT 

The number reading system is depicted diagrammatically 
in RG. 1. The system is designed to be implemented by a 
computer that has been programmed in accordance with the 
65 software system described herein. In FIG. 1 the computer 10 
with monitor 12 has been illustrated. Displayed on monitor 
12 is a target application 14, such as a spreadsheet 
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application, that will be the source of input to the number The presently preferred embodiment has been optimized 

reading system of the invention. Computer 10 includes a as a number reading system. Hence each entry in the 

suitable speaker system. In the illustrated embodiment, dictionaryofsamplcs40isadigitally recorded sample of an 

speakers 16 are disposed on the left and right sides of the individual word. As will be more fully explained below, the 

monitor. Of course, other speaker locations are also possible. 5 present system achieves a more natural sounding output by 

FIG. 1 also illustrates some of the more important internal taking prosodic environment and phonological features into 

components of computer system 10. These include the account. When a human reads from a prepared text, the 

central processing unit or CPU 20, random access memory spoken words are given different intonation or voice pitch, 

or RAM 22 and a disk drive system including disk storage depending on how they are used and where they appear in 

24 and suitable disk interface circuitry such as SCSI cir- the prepared text. The human reader instinctively alters the 

cuitry 26. These components are connected together by intonation according to the prosodic environment of the 

means of the computer bus forming a part of computer words being spoken. The change in intonation provides a 

system 10. The preferred embodiment is designed to work powerful cue to the listener, conveying prosodic 

with a commercially available sound card 26. The sound information, such as where phrases begin and end, or where 

card includes suitable digital-to-analog conversion circuitry jj sentences begin or end, or where columns of numbers begin 

28 for converting digitally recorded samples into analog and end. These prosodic cues are not limited to phrase and 

signals that may be amplified by amplifier 30 for playing sentence structure. They also convey important information 

through speaker 16. The sound card preferably connects to when reading numbers. In the English language, numbers 

the CPU 20 by attachment to the computer bus, as illus- are naturally subdivided into triplets. These triplets are 

traied. The sound card can be, for example, a commercially 20 commonly punctuated with commas (e.g. 1,234). When 

available Soundblaster card available from Creative Labs, reading numbers in the English language, the reader uses 

Iqc. different voice pitch or intonation on the different individual 

The computer system 10 is programmed to implement digits, according to where they appear in the overaU number, 

several functional modules. These are illustrated in FIG. 1 Thus the preceding example (1,234) would be pronounced 

and will be described next. 25 thousand, two hundred thirty-four," with different pitch 

The reading system of the invention is a concatenative contour placed on the beginning digit (1), the digit foUo%ving 
reading system. Concatenation is the process of stringing the commas (2) and the endmg digit (4). 
together or combining individual speech samples into a Because the individual words may have different pitch 
sequence. The individual speech samples each represent contours, depending on the prosodic environment, the die- 
discrete units of speech, such as phonemes or words. The 30 tionary of samples 40 includes a different sample for each 
individual samples are strung together to produce a single pilch contour. Thus, in a number reading system that is 
sequence that, when played through the sound card at the designed to simulate spoken English, three different intona- 
proper sampling rate, produces sound that simulates speech. lions or pitch contours may be employed: an initial 
Although concatenative speech systems are known, the intonation, a pre-pausal intonation and a final intonation, 
present invention greatly improves upon existing concatena- 35 Thus each word in the dictionary would be stored three 
live speech techniques by taking into account prosodic times, one for each intonation. 

environment and phonological features. The reading system To further refine the output quality, the number reading 
of the invention uses these attributes to generate natural- system also takes into account the fact that a human reader 
sounding speech, having the appropriate pronunciation, will pronounce phonemes differently, depending on what 
intonation, inflection and phrasing for the given context. The 40 sounds immediately precede and follow that phoneme, 
result is more natural speech that is less fatiguing to listen to. These are phonological features that give synthesized speech 
The reading system has a dictionary of sampled sounds a more natural, human-like quality. The concatenative read- 
40. These are digitaUy sampled sounds that have been ing system of the invention analyzes each of the concat- 
recorded and stored in advance. The sampling rate used to enated elements (e.g. phonemes or words) to select the 
digitize the speech samples can be selected based on system 45 proper sound based on the element's adjacent neighbors. In 
requirements. If memory resources are limited, a lower ihe preferred embodiment that has been optimized for num- 
sampling rate (e.g. 11 kilohertz) may be used. For higher ber reading, individual words are stored in the dictionary 40 
quality speech a higher sampling rate (e.g. 22 kilohertz) may in a variety of forms, corresponding to the different pronun- 
be used. If compact disc quaUty audio is desired, a stiU ciations that may be required in certain phonological set- 
higher (e.g. 44.1 kilohertz) sampling rate may be used. If 50 tings. Thus, in addition to storing one entry for each of the 
desired, the dictionary of samples can include separate prosodic environments, dictionary 40 also stores all pronun- 
dictionaries of sampled sounds, sampled at different sam- ciation variants of the word for each prosodic environment, 
pling rates. The reading system could be provided with a Thus, regardless of what word precedes or follows a given 
suitable button in the user interface control system to select word and regardless of what the prosodic environment may 
which dictionary should be used. In general, the dictionary 55 be, the dictionary 40 contains a sample to match, 
of samples comprises a complete collection of all possible Returning to FIG. 1, the number reading system employs 
sounds that the concatenative reading system may string a word list generator 42 that performs the first pass of the 
together. For number reading systems having a relatively two-pass system of the preferred embodiment. Word list 
limited vocabulary (i.e., a limited number of possible words generator 42 accesses an input buffer 44 containing the text 
the system can pronounce) the dictionary entries can be 60 to be converted to speech. The input buffer can be loaded 
individual words. In more complex reading systems, where with text through any suitable mechanism. For example, the 
a larger vocabulary must be supported, the dictionary of input buffer can be loaded by copying data from the target 
samples may store more elemental speech components, such application 14. Word list generator 42 includes a prosodic 
as individual phonemes. Whether to store entire words or environment table 46 that identifies the different possible 
individual phonemes is largely a system design issue. Ilie 65 prosodic environment states of the implementation. ITie 
system designer should select the appropriate "granularity" preferred embodiment defines three prosodic environment 
or dictionary entry size to suit the specific application. states, an initial state, a final state and a pre-pausal state. 
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These three states work well ia number reading applications appropriate placeholder words in the word list. Thus, the 

for which the current preferred embodiment has been opti- numerical value "l^OO"* would generate the word list "one- 

mized. Of course, a system may be constructed with a larger thousand-two-hundred." 

or fewer number of prosodic environment states, depending Accordingly, in Step 88 the word list generator generates 
on the application. 5 the word list, storing it in the word list data structure 120. 

Word list generator 42 processes the text stored in input The word list data structure, described more fiiUy below in 

buffer 44 to build a word list. This is stored in a word list connection with FIG. 3, is stored in the word list buffer 48. 

buffer 48. The process will be described more fully below in As illustrated, the word list data structure comprises an array 

connection with FIGS. 2 and 3. of ordered pairs. Each ordered pair comprises a token. 

Essentially, the word list is a list containing word token representing a word to be spoken, and the prosodic envi- 

for each word that will need to be synthesized in the output ronment state of that word. The word list generator deter- 

speech. These word tokens are arranged in the order they mines the prosodic environment state by accessing the 

will be pronounced in the output speech. Associated with prosodic environment table 46 to assign an environment 

each word is a prosodic environment token, to signify the token indicating what intonation should be used when 

prosodic state of each word in the word list. pronouncing the word in its current prosodic environment. 

The reading system further comprises a phonological Specifically, the word list generator examines the word in 

feature analyzer 50. The feature analyzer includes a phono- relation to its location within the text of the mput buffer and 

logical feature table 52 that identifies, for every word in the its relation to placeholder punctuation marks. If the word 

dictionary, what the word begins with and ends with. The appears as the initial word in a numerical value, it is assigned 
phonological feature analyzer analyzes each entry in the ^° an initial state token. If the word is the final word in the 

word list buffer, using the phonological feature table 52 to numencal value it is assigned a final state token; and if the 

examine the words that precede and follow each entry in the ^or^* meets other criteria, it is assigned a pre-pausal state 

word list buffer. Using this information the phonological token. The precise rules for assigmng environment state 

feature analyzer accesses the dictionary of samples 40 to tokens are set forth m the pseudocode appeanng m the 

build a sample list. The sample list is stored in the sample list Appendix. The pre-pausal state is referred to as the 

buffer 54. Stored in the sample list buffer are the actual "comma" state in the pseudocode. 

digital samples corresponding to the appropriate prosodic After the word list has been constructed, the second 

environment and phonological features of the words in the processing pass begins at Step 90. Essentially, the second 

word list buffer. The sample list buffer may then be serially pass examines each entry in the word list and builds a 

output through the digital-to-analog converter 28 to produce ^° sample list. Starting at the head of the word list and 

the audible speech output signal. continuing through the list until the end is reached, a 

Tlie word list generator 42 and phonological feature processing loop is performed. (See Steps 92-UO). In Step 

analyzer 50 effect a two-pass conversion process. This 94 the word and prosodic environment tokens are read from 

process is depicted in FIG. 2. During the first pass the word „ »he current entry in the word list. Next, m Step 96, the 

and prosodic environment Ust is generated. During the phonological features for the current entry are determmed 

second pass the sample list is generated. In FIG. 2 the first using the phonological feature table 52. Specifically in this 

pass begins at Step 80 and the second pass begins at Step 90. step, the words adjacem (preceduig and following) the 

Af. .u ■ * * i u \r.^A^A r.*^ tu^ ;««i,t K„ffAr current word are examined. ITie end of the preceding word 

After the input text has been loaded mto the input butier ..... ri^rn- i i. 

^ ^ . J , . r ■ , and the beeinnme of the followine word are used to access 

44 the word list generator performs a preprocessing step 40 " " ° . , r . . ui . ^ . • ♦u u 

/c» on\ T *u tul tJlt .w, th^ the phono ogical feature table 52 to determine the phono- 

(Step 82). In the preprocessing step the text in the input , . x c .t. . j 

t c/ ^ J . i A .A r.^ ™«„ bgical feature state of the current word, 

buffer IS cleaned up to remove or standardize any user ^ 

punctuation marks. Thus, in a number reading system the lo Step 100 the currem word, its prosodic environment 

preprocessing step will clean up commas, hyphens and slash attribute and its phonological feature attribute are used to 
marks, making them consistent throughout the text. These 45 look up and copy the appropriate digital sample into a 

punctuation marks can ser\'e as prosodic cues to denote sample list data structure 122. The sample list data structure 

where pauses or other words should be injected. For is stored in the sample list buffer 54. EssentiaUy, this step 

example, the hyphen may be read as "minus" and the slash builds the sample list by selecting the digital sample having 

mark may be read as "divided by." Commas may signify the appropriate sound and intonation for the current context, 
how a number is divided into triplets. 50 ^^r adding the digital sample entry to the sample list, the 

Next, (Step 84) the integer portion of the input string is procedure (in Step 110) indexesthe current entry pointer to 

converted to numerical values. To explain, numbers written the next entry in the word list. Tlie procedure then branches 

in text appear as ASCII characters representing the indi- back to Step 94 where the cycle is repeated over and over, 

vidual digits of the number. So that the number can be "nlil the last entry in the word list is processed, 
properly processed for text-to-speech conversion, the ASCII 55 After the last entry in the word fist has been processed, the 

representation must be converted into a numerical represen- sample list contains a fuU sequence of all digital samples 

tation. In effect, the ASCII character string representing the needed for a concatenalive playback. This is iUuslrated at 

ordinal numbers is converted into an integer form that the Step U2, where the sample list is played by sequentially 

computer will treat as an integer data type. oulputling the samples through the digital-to-analog con- 
After conversion of the number into a numerical value, 60 verter in the order stored. This results in a concatenated 

the number is normaUzed into ranges in Step 86. In English synthesized speech signal that may be amplified and played 

language text-to-speech applications, numbers are typically through the speaker 16. 

normalized or grouped into triplets. Other languages may To further illustrate the invention in its preferred 

group numbers into different ranges. For example, in embodiment, FIG. 3 shows some of the data structures that 
Japanese, the numbers are grouped into groups of four 65 are used in the current implementation, 'llie data structures 

digits. By grouping the numerical value into ranges such as are physically implemented as objects in the computer 

triplets, the word list generator is then able to insert the random access memory 22. Word list 120 (stored in the word 
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list buffer 48) is essentially an array of integers to which 
codes or tokens are assigned to represent the words in the 
systen3*s finite vocabulary. The word list comprises a set of 
ordered pairs, each ordered pair comprising a token to 
represent a word from the dictionary of samples 40 and a 
prosodic environment state token associate with that word. 
The prosodic environment stale tokens are described in the 
environment table 46. In the preferred embodiment three 
states are recognized, initial, final and pre-pausal. Of course, 
systems may be implemented using a different number of 
prosodic environment states if desired. Word list 120 is 
populated with data by word list generator 42 during the first 
processing pass. 

The second processing pass involves populating the 
sample list data structure 122 that is stored in the sample list 
buffer 54. The phonological feature analyzer 50 identifies 
the word preceding and the word following the current entry 
and accesses the phonological feature table 52 to ascertain 
what the word begins with and ends with. W^th this infor- 
mation the phonological feature analyzer then selects the 
appropriate sample from dictionary 40. Selection of the 
appropriate sample involves knowing three pieces of infor- 
mation: the word identifier or token, the prosodic environ- 
ment associated with that entry and the phonological fea- 
tures that affect pronunciation of the entry in its current 
context. The word identifier token and prosodic environment 
information come from the word list. The phonological 
feature information is obtained by accessing the feature table 
as described above. With this information the proper sample 
is identified and extracted fi-om dictionary 40. The sample is 
placed into the sample list data structure 122 as a digital 
sample that will be later played through the sound card and 
associated audio equipment. 

From the foregoing it will be appreciated that the present 
invention provides a concatenated reading system that com- 
bines prosodic environment and phonological information to 
achieve a more natural, human-like reading. Although the 
present invention has been illustrated and described with 
reference to a number reading system, it will be apparent 
that the techniques employed in the illustrated embodiment 
can be applied to other types of reading systems. 
Accordingly, it will be understood that the invention is 
capable of certain modification or change without departing 
from the spirit of the invention as set forth in the appended 
claims. 

APPENDIX 

Objects in Memory: 

A list of "words"; an array of integers, which can be 
assigned codes representing the words in the system's finite 
vocabulary. 

A list of "prosodic environments"; an array of integers, 
where each one can be assigned a code representing one of 
a class of intonational types, i.e., "initial," "final," "comma," 
etc. 

A table of phonologicalal features; an array of phonologi- 
calal features belonging to each of the words in the system's 
vocabulary. Features represent the type of phonemes which 
words may begin and end with, so that sample features 
might be "ends with a vowel" or "begins with an S," 

A set of recordings (or "samples") of each of the words in 
the system's vocabulary, with multiple recordings of each as 
is appropriate for different environments. 

A list of "samples" to be played back in sequence through 
the audio device. 

Module I: Construct Word and Prosodic Environment List 
Several different modules may be used here, depending on 
what types of numbers and statements are to be generated. 
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As an example, we present pseudocode for generating 
integers between 1 and 999,999,999. 

Where 1 speak of "adding word X with intonation Y," this 
means that new entries are made in the word hst and 
5 prosodic environment list as described above, and that these 
are assigned the values X and Y, respectively. 

Clean up input string. 

Convert integer portion to a number value. 

For each triple (millions, thousands, ones), do the follow- 
ing: (for example, if the integer supplied is 123,049,228, 
make three passes through this loop using 123, 49 and 228). 

If hundreds place is not zero, add "one' -"nine" plus 
"hundred"; select final intonation if tens and zeros places are 
both zero, and we're in the ones triple, otherwise neutral 
intonation. 

If tens place is one and ones place is nonzero: 
add appropriate "teen" words; select final intonation if 
this is the ones triple, otherwise neutral intonation. 
Else: 

If tens place is nonzero, add "ten," "twenty," . . . "ninety," 
depending on this value; use final intonation if ones place is 
zero and this is the ones triple, otherwise neutral intonation. 

If ones place is nonzero, add "one," "two," , , . "nine," 
depending on its value; use final intonation if ones place is 
2j zero and this is the ones triple, otherwise neutral intonation. 

If thus is the millions triple, add the word "million"; if this 
is the "thousands" triple, add the word "thousand." If this is 
the millions triple and last six digits of the number are 
000000, or if this is the thousands triple and the last three 
digits of the number are 000 (in other words, if the value of 
the whole integer divided by the base leaves a remainder of 
zero), then use final intonation, otherwise use comma into- 
nation. 

Module II: Construct Sample List 

The function of this module is to fill the sample list 
described earlier with codes corresponding available 
samples. 

The module proceeds through the word list and prosodic 
environment list one by one (in the order they were added to 
it), and selects a sample for each one according to a set of 
mles which may be sensitive to any of the following: 

the identity of this word; 

the prosodic environment of this word; 

the phonological features of the preceding word (as 
discovered by looking them up in the phonologicalal feature 
table). 

Likewise, the phonologicalal features of the following 
word. 

What is claimed is: 

1. A high quality concate native reading system for con- 
verting an input string into a sequence for audible synthesis, 
comprising: 

a dictionary of complete word speech samples corre- 
sponding to entire words stored in a computer-readable 
55 medium; 

a word list generator receptive of said input siring for 
building and storing word list tokens in a word List, the 
word list generator building said word list from words 
stored in said dictionary that correspond to the input 
60 spring; 

said word list generator further having a list of prosodic 
environment tokens representing a plurality of intona- 
tion types, said word list generator assigning at least 
one of said prosodic environment tokens to at least 
65 some of the word list tokens; 

phonological feature analyzer that analyzes said word list 
tokens and said assigned prosodic environment tokens 
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and selects said complete word speech samples from analyziog the phonological attributes associated with the 

said dictionary to build a sample list based on (a) the word tokens in said word list by examining the pho- 

word list tokens, (b) the prosodic environment tokens nological features of adjacent words in said list; 

and (c) the phonological features of adjacent words; ^^^p,^,^ ^^^^ ^^^^j, ^^^p,^^ ^^^^ p^. 

output for concalenatively supplying said sample list to an ' determined dictionary of complete word speech 

analog conversion unit to produce an audible text-to- ^^P'" correspondmg to entire words based on (a) 

Speech sigoal. word lisi tokens, (b) said corresponding prosodic 

2. The reading system of claim 1 wherein the word list environment tokens, and (c) said phonological 
generator is further operable to add numeric placeholder jq attributes; and 

words corresponding to integers in said input string. building a sample of list said selected complete word 

3. The reading system of claim 1 wherein said set of ^^^^ ^^^^ supplying said sample list for 

speech samples includes a speech sample entry for each of ^™t««ot\« ™.t«,.» ^« i 

. , c - . r CO ncate native output to an analog conversion unit to 

said plurauty of intonation types, , , • , 

4. The reading system of claim 1 wherein said word list ^ P^^'*"'^^ ^" ^""f^^^ lext-to-speech signal 
generator builds said word list as ordered pairs, each pair 8. The method of claim 7 wberem the step of consirucUng 
comprising a word token and a prosodic environment token. said word list includes adding numeric placeholder words 

5. The reading system of claim 1 wherein said phono- corresponding to integers in said input string, 

logical feature analyzer examines at least the word preced- 9. The method of claim 7 wherein said set of speech 

ing an entry in the word list to determine the phonological samples includes a speech sample entry for each of said 
features of adjacent words. 20 dififerent intonation types. 

6. The reading system of claim 1 wherein said phono- iq method of claim 7 wherein said step of building 
logical feature analyzer examines at least the word followmg ^ ^^^^ comprises buUding said word list as ordered 
an entry m the word list to determme the phonological . , . . , , , , 

- . - .. . , pairs, where each pair comprises a word token and a 

features of adjacent words. • , 

7. A method of text-to-speech conversion, comprising: 25 P^^^^^ environment token. . , . 
receiving an input string representing text to be covered ^^^^^^ °f ^^^^ ^ ^^erem said step of analyzing 

into audible synthesized speech; ^^e phonological attributes comprises examinmg at least the 

constructing a word list of word tokens corresponding to word preceding an entry in the word list to determine the 

the input string by accessing a dictionary of complete attribute based on phonological features of the preceding 
word speech samples corresponding to entire words 30 word. 

stored in a computer-readable medium; 12. The method of claim 7 wherein said step of analyzing 
supplementing said word list with prosodic environment the phonological attributes comprises examining at least the 
tokens that represent different intonation types, such word following an entry in the word list to determine the 
that at least some of the word tokens in said word list attribute based on phonological features of following word, 
are associated with a corresponding prosodic environ- 
ment token; ***** 
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